-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cache memory limit for syncer of drainer #715
Conversation
/run-all-tests |
/run-all-tests |
@july2993 PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find a more simple implementation, don't go
for every Add and Pop op
38471d8
to
df57764
Compare
@july2993 All go functions have been deleted. |
In what case is a buffer size as large as I think OOM is unavoidable, for example, the machine running drainer may have only 4 GB of memory, but the average size of a sequence of binlogs is more than 0.4 GB, then even with this new configuration, there wouldn't be enough memory because there's a buffered channel with size 10 for each pump instance. |
|
|
@lichunzhu you can fire another pr to change the default buffer size first. |
Note that the unit of |
@suzaku IMO, |
Current |
@suzaku PTAL |
I think the problem to introduce this new configuration is that it can't effectively solve the OOM problem. The "cache" is one of the largest buffer with the default configuration, but it's by no means the only data structure that may eat up a lot of memory. It's "obviously" the bottleneck because it's 65536 by default, with a big worker-count and batch-size the bottleneck would be elsewhere. This limitation add complexity to both our code and the user interface and can't actually solve the problem, so I suggest we don't add this limitation that only apply the one single data structure. |
@suzaku If a client's binlogs are all extremely huge, I think this kind of limitations can take effects. The number of cached binlogs can't explicitly reprent the memory usage of syncer. |
But if the binlogs are all extremely large, then other data structures would still eat up the memory and cause OOM. |
What problem does this PR solve?
TOOL-1480
The cached binlogs count can't explicitly reprent the memory usage of drainer. To compute the memory usage preciselier and avoid one single binlog item consuming huge memory which may cause OOM, we need to measure every single binlog item's memory usage and limit the sum of them.
What is changed and how it works?
maxBinlogCacheSize
to limit the maximum cached binlog size. This variable can be modified by users. The default size is 4GB.binlog.PrewriteKey
,PrewriteValue
etc. This index is computed bybinlogItem.Size
which is a newly added function.sync.Cond
to block when memory usage is too big. ForbinlogItemCache.Push
, if adding a newbinlogItem
will cause exceeding memory usage, it will usecond.Wait()
to block itself. Otherwise, it will addcachedSize
and cache thisbinlogItem
inbinlogItemCache.cacheChan
channel. InbinlogItem.Pop
, after picking abinlogItem
, it will minuscachedSize
and usecond.Notify
to notify the functionPush
to check whether it's OK to cache newbinlogItem
.TestSyncerCachedSize
is added into thesyncer_test.go
. In this test we will try to add 10binlogItems
and for eachbinlogItem
thesize
is 1. ThemaxBinlogCacheSize
is temporarily set to 1 and a go routine will keep checking whether thecachedSize
exceedmaxBinlogCacheSize
or not in every 0.05s.binlogItem
whose memory usage is bigger than maximum cached size, we will wait thesyncer.input
chan to dump all thebinlogItem
. And then cache this to avoid it occupy the memory for a long time.syncer
stops receivingbinlogItem
fromsyncer.input
ifcollector
is keeping usesyncer.Add
to addbinlogItem
into thesyncer.input
channel it may get blocked whencachedSize
exceedmaxBinlogCacheSize
. So I add variablequiting
to signify that syncer is quiting. Ifsync.quiting == true
,syncer.Add
will never runcond.Wait
to avoid being blocked.Check List
Tests
Code changes
Side effects
Related changes